Phonetic annotation of a non-native speech corpus

نویسندگان

  • Patrizia Bonaventura
  • Peter Howarth
  • Wolfgang Menzel
چکیده

Annotating non-native speech on a phonetic level is an extremely labour-intensive task and therefore requires a proper balance between the expected benefit and the resources needed. This paper reports on the experience gained when collecting and annotating a corpus of English sentences recorded by students with Italian and German as their mother tongue. The annotated data were used intensively during the development phase of a language learning tool, which allows automatic diagnosis of pronunciation errors and gives corrective feedback to the learner. Suggestions for further improvement in the annotation procedure will be presented, based on the experience acquired in creating the corpus.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Transcription and annotation of a Japanese accented spoken corpus of L 2 Spanish for the development of CAPT applications

This paper addresses the process of transcribing and annotating spontaneous non-native speech with the aim of compiling a training corpus for the development of Computer Assisted Pronunciation Training (CAPT) applications, enhanced with Automatic Speech Recognition (ASR) technology. To better adapt ASR technology to CAPT tools, the recognition systems must be trained with non-native corpora tra...

متن کامل

iCALL corpus: Mandarin Chinese spoken by non-native speakers of European descent

We present iCALL, a speech corpus designed to evaluate Mandarin Chinese pronunciation patterns of non-native speakers of European descent, developed at the Institute for Infocomm Research (IR) in Singapore. To the best of our knowledge, iCALL is larger than any reported non-native corpora to date in terms of utterance number, duration, and number of speakers: iCALL consists of 90,841 utterances...

متن کامل

A corpus-based analysis of transfer effects and connected speech processes in Vietnamese English

This paper presents a corpus-based descriptive analysis of the most prevalent transfer effects and connected speech processes observed in a comparison of 11 Vietnamese English speakers (6 females, 5 males) and 12 Australian English speakers (6 males, 6 females) over 24 grammatical paraphrase items. The phonetic processes are segmentally labelled in terms of IPA diacritic features using the EMU ...

متن کامل

CoRuSS - a New Prosodically Annotated Corpus of Russian Spontaneous Speech

This paper describes speech data recording, processing and annotation of a new speech corpus CoRuSS (Corpus of Russian Spontaneous Speech), which is based on connected communicative speech recorded from 60 native Russian male and female speakers of different age groups (from 16 to 77). Some Russian speech corpora available at the moment contain plain orthographic texts and provide some kind of ...

متن کامل

Automatic accentedness evaluation of non-native speech using phonetic and sub-phonetic posterior probabilities

Automatic evaluation of non-native speech accentedness has potential implications for not only language learning and accent identification systems but also for speaker and speech recognition systems. From the perspective of speech production, the two primary factors influencing the accentedness are the phonetic and prosodic structure. In this paper, we propose an approach for automatic accented...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000